EDA

For Consistency, all data from here on out will be from 2006-2020 to keep time spans consistent. All code for this tab can be found here.

Solar Radiation and Weather Data

Time Series of GHI

Here we examine the main form of solar radiation of interest to us, GHI - solar radiation that solar panels make use of.

Trend: From the plot we can see something very good for all of us, a stable and consistent seasonal pattern. This pattern remains consistent over time indicating no major trend.

Seasonality: We can see consistent peaks in GHI during the summer time, always peaking in June around the summer solstice, and always at a trough around the winter solstice in December.

Periodic Variation: There are plenty of instances of variation from the overall seasonal cycle that can be caused by a few factors. Clouds can block or scatter the sun’s rays, causing changes in the amount of GHI. Atmospheric conditions like dust, pollution, and other particles in the air can affect the amount of GHI that reaches the Earth’s surface. Weather patterns such as storms, high-pressure systems, and fronts can cause short-term fluctuations. Lastly, the sun operates on 11-year solar cycles, during which the sun’s activity changes due to the number of sunspots, magnetic storms, and solar flares, which can impact the amount of energy and particles that the sun releases into space.

Additive Time Series:This is an additive time series as we see constant changes over time, not exponential growth or decay. Because there is no clear trend over the time span and variance appears to be relatively equal, we can say with confidence that this is additive.

Lag Plots

Lag plots are a useful tool for visualizing patterns in time series data. They help to detect the presence of autocorrelation, which occurs when the value of a time series at a given point is related to the values at previous points.

In a lag plot, each data point is plotted against a lagged version of itself, with the x-axis representing the original time series values and the y-axis representing the lagged values or vice versa. The resulting scatter plot can then be used to identify any patterns or relationships between the original and lagged values.

Lag plots are useful because they provide a visual representation of the relationships between the values of a time series, which can be helpful in identifying patterns such as seasonality, trend, or autocorrelation. Autocorrelation can have a significant impact on time series forecasting, as it can indicate that the values of the series are dependent on previous values and not just random fluctuations.

Below is a lag plot of GHI:

Here we see some expected patterns arising. A lag of 1 day exhibits very high positive correlation indicating that the yesterday’s solar radiation is useful for predicting todays. At a lag of 1 month, we see strong positive correlation but certainly less than at one day. A one month lag is usually within the same season and thus has similar dynamics with regard to orientation of the earth and sun. At a 3 month lag we see no correlation as these reflect the relationship of solar intensity between adjacent seasons which can be either positive or negative. At 6 months, we see a strong negative trend because halfway around the calendar, solar radiation is moving the opposite direction - for instance decreasing radiation in the winter compared to increasing radiation in the summer. At 9 months we see a repeat of the relationship at 3 months for the same reasons, it is just the next season’s data from the prior year (winter 2015 on spring 2014, etc.). Lastly at 360, nearly a full year, we see a strong positive correlation again.

Decomposition

This additive decomposition gives insight into solar radiation. We see a very consistent trend rangebound within approximately 15 GWh over the 14 years. The plot also makes clear the consistent seasonal element of GHI fluctuating between peaks and troughs in the summer and winter respectively.

Moving Average Smoothing

Moving averages are another really useful way to identify trends from the overall data. They work by sliding an averaging window over the data. For instance, the 30 day MA is claculated by taking the average of the 30 data points around the target data point (15 before and 15 after). This has the effect of removing noise from the data. The MA plot here shows more of the same, we can clearly see the cyclical/seasonal nature of the data based on the 30 and 180 day MA’s and the 1 year MA shows a truly flat trend in the context of the data as a whole.

ACF and PACF

ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) are two commonly used plots in time series analysis to help identify the underlying structure of a time series.

The ACF plot shows the correlation between a time series and lagged versions of itself, which helps to identify if there is a pattern in the time series that repeats itself over time. If there is a strong correlation at a specific lag, it suggests that there is an autocorrelation in the series.

On the other hand, the PACF plot shows the amount of correlation between a time series and lagged versions of itself after removing the effect of intermediate lags. It can help identify the number of autoregressive (AR) terms in an ARIMA (AutoRegressive Integrated Moving Average) model, which is a common time series forecasting model. The partial autocorrelation plot shows the correlation between a time series and its lagged values, but with the effects of previous lags removed.

The ACF plot once again displays the intense seasonality, with strong autocorrelations at almost every lag. The PACF shows strong correlation for the first handful of lags and then becomes insignificant afterwards. These plots clearly display a non stationary time series as there are intense autocorrelations amongst lagged values.

Stationarity Test

An Augmented Dickey Fuller Test shows that this time series is not stationary, which can be clearly seen through the ACF plot referenced above. The reason this test is so close to confirming stationarity (technically a p-value of below .05 is all that is needed) is because the test is largely concerned with seeing a consistent mean throughout the time series, which the GHI time series appears to have. Because the main non-stationarity is a result of seasonality, we also use a KPSS test which examines this element of the time series, and here we see a more clear rejection of stationarity.


    Augmented Dickey-Fuller Test

data:  radiation
Dickey-Fuller = -3.3957, Lag order = 17, p-value = 0.05411
alternative hypothesis: stationary

    KPSS Test for Level Stationarity

data:  radiation
KPSS Level = 0.070398, Truncation lag parameter = 10, p-value = 0.1

Making Solar Data Stationary

To make the data stationary, we make use of first differencing, which follows the formula: \[\hat{y_t} = y_t - y_{t-1} \] In doing so we are looking at the change between the current and previous value, which removes trend and seasonality from the data. After applying this transformation we can see the new data:

Updated ACF and PACF

The new ACF and PACF plots show how significant this transformation was from a stationarity perspective. Here the ACF shows a significant autocorrelation at a lag of 1 (because it incorporates the previous term in differencing) and then the correlations become insignificant noise. The PACF largely does not change from above, but this transformation nonetheless has taken the data from seasonal to stationary.

Updated Stationarity Tests

For a sanity check, we see that the new data is now described as stationary.


    Augmented Dickey-Fuller Test

data:  radiation
Dickey-Fuller = -3.3957, Lag order = 17, p-value = 0.05411
alternative hypothesis: stationary

    Augmented Dickey-Fuller Test

data:  solar_ts
Dickey-Fuller = -21.578, Lag order = 17, p-value = 0.01
alternative hypothesis: stationary

California Energy Generation and Consumption

Time Series of CA Solar Energy Consumption

For the rest of this project, we will be focusing solely on the solar energy consumption in CA, as it provides a more comprehensive picture of solar energy use then utility scale generation.

Trend: There is a clear positive trend in the data as solar energy adoption increases over time. This trend appears to be exponential and began to really take off around 2014. California has a long history of providing incentives for renewable energy, including solar. In 2013, the state introduced the California Solar Initiative, which provided rebates and incentives to help offset the cost of solar installations. This combined with declining prices, and public demand in a liberal market caused the explosion in solar demand we see.

Seasonality: The seasonality element of solar energy consumption cannot be ignored. Production and therefore consumption is constrained mainly by solar radiation, which we saw with GHI above is highly seasonal. As a result, most of the power is produced and consumed in the summer months. This is a big limitation on solar power currently, as energy demands usually increase in the winter.

Periodic Variation: There are plenty of instances of variation from the overall seasonal cycle that can be caused by a few factors. Energy prices of alternatives can lead to an increase or decrease from the norm. Economic cycles can also contribute to energy demand as well as the adoption rate of solar panels - these usually work together as people will not invest in solar panels as readily in a recession. Additionally, the same factors that can impact GHI like cloud cover, atmospheric conditions, and solar cycles can impact generation and thus consumption.

Multiplicative Time Series:This is a multiplicative time series as we see proportional, instead of constant changes over time. In other words, given the exponentially increasing trend and the fact that the trend and seasonality variations increase as the magnitude of the data does. In other words as the trend increases, so do the seasonal peaks and troughs.

Lag Plots

The lag plots begin to show what was clear from the original plot - that this data is non-stationary. At all lags there are positive correlations for each month indicating autocorrelation withing the series. Correlations are strongest at a lag of 1, 12 and 24 as these represent adjacent months, and then calendar year lags. At lags of 3,6 and 9 months we still surprisingly see positive correlation, indicating that despite seasonality, the trend is so strongly positive that it overcomes the usual peak and trough pattern occurring throughout the seasons.

Decomposition

A multiplicative decompostion was used to model this data as discussed above. We can see that after 2014 these residuals flip positive instead of negative, as that is when the exponential trend really starts to lift off. We can see the consistent seasonal pattern and the noise located around a mean of 1 which lends credence to the choice of a multiplicative decomposition.

Moving Average Smoothing

The MA plot makes clear this exponential trend The 3 and 6 month MA’s show the increasing trend along with the seasonality of the data while the 1 year MA shows only the trend which has begun to level off from its previous exponential growth since 2014. Without the noise we can see that the seasonal fluctuations have become a bit more intense as the trend has increased indicating that this will be an engineering problem that will have to be looked at closely or complimented with other energy sources to lead to a fully sustainable energy source.

ACF and PACF

The ACF and PACF plots show the lack of stationarity in this data, with strong postive correlations at all lags in the ACF. The PACF decays as the lags get longer which is typical given the ACF plot tailing off in correlation in step with the lags.

Stationarity Test

The stationarity test confirms our suspicions that the data is in fact not stationary.


    Augmented Dickey-Fuller Test

data:  cons
Dickey-Fuller = -3.1473, Lag order = 5, p-value = 0.09869
alternative hypothesis: stationary

Making Solar Energy Consumption Stationary

To make the data stationary we use a similar but slightly different technique from the solar data. We use differencing to remove the seasonality and trend, but first we take the log of the data. This is because with a multiplicative time series and an exponentially increasing trend, the variance in the differences is too variable especially as the trend takes off. A fantastic way to remove the heteroscedasticity and look at the data as percentage differences instead of absolute differences is to take the log.

Updated ACF and PACF

The new ACF plot shows the large effect this has on the data as now autocorrelation is almost entirely removed except at lags of full years. This data is significantly more stationary, with more constant mean and variance over time.

Updated Statitonarity Tets

The augmented dickey fuller test once again confirms this change successfully made the data stationary.


    Augmented Dickey-Fuller Test

data:  cons
Dickey-Fuller = -3.1473, Lag order = 5, p-value = 0.09869
alternative hypothesis: stationary

    Augmented Dickey-Fuller Test

data:  cons_d_ts
Dickey-Fuller = -8.0818, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

Solar Stocks - SPWR

For the sake of simplicity, SPWR will be the chosen stock for this analysis. SunPower has a strong presence in the state of CA, with a history of providing high-quality solar panels and excellent customer service. Additionally, the company’s focus on innovation and commitment to sustainability positions it well in a market that values efficiency, durability, and environmentally-friendly solutions. Furthermore, SunPower’s strong brand recognition and reputation has made it an industry leader in the state. Below is their stock price chart:

Trend: SPWR stock seems to indicate no real trend although there are subtrends within the plot that are quite strong. The run up pre-2008 coincided with immense optimism about renewable energy, which was then quickly sapped after the Great Financial Crisis. The stock remained relatively quiet until 2014 when another jump occurred likely coinciding with the increase in solar energy demand in CA as a result of the California Solar Initiative. This was followed by another downtrend until a massive spike after the lax monetary policy brought on by the COVID-19 response by the Federal Reserve which saw the entire market rise

Seasonality: Seasonal factors can also play a role in the stock prices of solar companies. For example, the demand for solar panels and other solar technologies may be higher in the summer months, when there is more daylight and higher levels of solar irradiation, leading to higher stock prices for solar companies during this time. In the case of SPWR, it is not clear from this view that the stock exhibits seasonality.

Periodic Variation: Stock prices are enormously volatile and can go off trend at a moment’s notice. Economic indicators, such as gross domestic product (GDP) growth, unemployment rates, and inflation, can have a significant impact on stock prices. We see an example of the economy impacting the stock price after 2008 and the GFC. C Political events, such as elections, government policies, and geopolitical tensions, can also impact stock prices, like we see here in 2014 after the passage of the California Solar Initiative. Market sentiment, or the overall mood and attitude of investors, can also play a role in stock price variations. This is particularly applicable to solar energy as ESG investments usually do better in less cautious times.

Multiplicative Time Series: This will be treated as a multiplicative time series as stocks usually experience exponential growth or decay, although SPWR has largely remained flat over its entire public market experience.

Lag Plots

The patterns in this lag plot are qutie artistic! We see some positive autocorrelation in prices up to about 90 days before a significant tailoff to no autocorrelation between the prices. The strong autocorrelation at a lag of 1 day is the main takeaway of this plot and a sign that this data is not stationary.

Decomposition

The decomposition shows us two trend bumps coinciding with price spikes. The plot does give insight into the seasonal nature of the stock, which is much more consistent then what originally met the eye. We see consistent price increases taking place from the beginning of the year to 1/3 of the way (around April) before peaking and declining for the rest of the year. This pattern makes sense as financial markets usually project 6 months out, so if we know that solar consumption increases through the summer and peaks in the late summer, then the stocks in this space should increase from January through March.

Moving Average Smoothing

SPWR stock has been on a downward trend since 2006. The 60 and 180 day MA plots show some of the more major fluctuations the stock has seen, mostly before the GCF, and the 1 year MA shows a very smooth trendline on a downward trajectory. Despite the great momentum solar energy has seen, SPWR has not been able to reap the benefits of this.

ACF and PACF

The ACF and PACF plot shows just how non-stationary the stock data is with significant auto correlations for lags all the way up to a year and beyond. Stock data is known to be non-stationary so this and the test below are just confirmation.

Stationarity Test


    Augmented Dickey-Fuller Test

data:  stock_ts
Dickey-Fuller = -2.6121, Lag order = 15, p-value = 0.3191
alternative hypothesis: stationary

Making it stationary

First differencing is used to remove trend and seasonality. Just like with the consumption data we take the Log first to remove heteroscedasticity. Taking the log difference in stocks is particularly common and important to make the data symmetric around 0. Arithmetic returns are biased in that if you gain 100% on a stock and then lose 50% you make a full round trip, although intuitively you would think the average return is 25%. Taking the log of the price differences fixes this problem so that if your log return is 100% it will take a -100% return to make a full round trip. This is both helpful for interpretation and for the mathematical models as positive and negative numbers can be treated as equal magnitudes.

Updated ACF and PACF Plots

After log differencing, we see fully stationary data, with essentially no autocorrelation. This is the ideal outcome of making data stationary, with all lags within the bounds of insignificance.

Updated Stationarity Tets

The updated augmented dickey fuller test confirms this result.


    Augmented Dickey-Fuller Test

data:  stock_ts
Dickey-Fuller = -2.6121, Lag order = 15, p-value = 0.3191
alternative hypothesis: stationary

    Augmented Dickey-Fuller Test

data:  stock_log_ts
Dickey-Fuller = -17.75, Lag order = 17, p-value = 0.01
alternative hypothesis: stationary